A Genetic Algorithm for Data Reduction

نویسنده

  • Lisa Henley
چکیده

When large amounts of data are available, choosing the variables for inclusion in model building can be problematic. In this analysis, a subset of variables was required from a larger set. This subset was to be used in a later cluster analysis with the aim of extracting dimensions of human flourishing. A genetic algorithm (GA), written in SAS®, was used to select the subset of variables from a larger set in terms of their association with the dependent variable life satisfaction. Life satisfaction was selected as a proxy for an as yet undefined quantity, human flourishing. The data were divided into subject areas (for example health, environment). The GA was applied separately to each subject area to ensure adequate representation from each in the future analysis when defining the human flourishing dimensions. GENETIC ALGORITHMS – A BRIEF INTRODUCTION Genetic Algorithms are iterative, heuristic (experience based) search processes that can be used to find solutions to problems where an exhaustive search of all potential solutions would be impractical due to time or resource constraints. John Holland (1993) was credited with their invention in the early 1970s. They mimic natural evolution by using techniques such as natural selection, inheritance, mutation and crossover. They are used in fields and contexts where the optimal solution is required from within a large search space. This research attempts to find the best subset of variables to summarise a larger dataset. Typically, a Genetic Algorithm (GA) starts with an initial population. This initial population consists of chromosomes that are traditionally a string of zeros and ones, and is usually a randomly generated sample of the solution search space. For example, for a variable selection / reduction exercise where there are 30 variables to choose from, each chromosome will be 30 bits long. Each of these bits is called an allele and each of these 30 alleles will consist of either a randomly generated 0 (indicating the variable is not to be selected) or 1 (indicating the variable is to be selected). Figure 1 is a visual example of two chromosomes, each 30 alleles long, from a population encoded to represent variable selection or nonselection from a total of 30 variables. Figure 1 Example chromosomes

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of Reduction Settings and Inter-stand Tensions for Tandem Cold Mills using Genetic Algorithm

Cold rolling process is a complicated process which can be optimized by changing in variables and settings. This paper presents a set-up optimization system developed to calculate reductions and inter-stand tensions for each stand of a five stand tandem cold mill. The main objective in this optimization is minimization of power consumption. First, by using the analytical method, the equations ...

متن کامل

Applying Genetic Algorithm to EEG Signals for Feature Reduction in Mental Task Classification

Brain-Computer interface systems are a new mode of communication which provides a new path between brain and its surrounding by processing EEG signals measured in different mental states.  Therefore, choosing suitable features is demanded for a good BCI communication. In this regard, one of the points to be considered is feature vector dimensionality. We present a method of feature reduction us...

متن کامل

Optimal Placement of DGs in Distribution System including Different Load Models for Loss Reduction using Genetic Algorithm

Distributed generation (DG) sources are becoming more prominent in distribution systems due to the incremental demands for electrical energy. Locations and capacities of DG sources have great impacts on the system losses in a distribution network. This paper presents a study aimed for optimally determining the size and location of distributed generation units in distribution systems with differ...

متن کامل

Optimal Placement of DGs in Distribution System including Different Load Models for Loss Reduction using Genetic Algorithm

Distributed generation (DG) sources are becoming more prominent in distribution systems due to the incremental demands for electrical energy. Locations and capacities of DG sources have great impacts on the system losses in a distribution network. This paper presents a study aimed for optimally determining the size and location of distributed generation units in distribution systems with differ...

متن کامل

Improvement of effort estimation accuracy in software projects using a feature selection approach

In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...

متن کامل

Adaptive Network-based Fuzzy Inference System-Genetic Algorithm Models for Prediction Groundwater Quality Indices: a GIS-based Analysis

The prediction of groundwater quality is very important for the management of water resources and environmental activities. The present study has integrated a number of methods such as Geographic Information Systems (GIS) and Artificial Intelligence (AI) methodologies to predict groundwater quality in Kerman plain (including HCO3-, concentrations and Electrical Conductivity (EC) of groundwater)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015